Round 1: Machine Coding / Spark Coding
📍 Task Description: Provided with 4 nested JSON files representing 4 tables, write a Spark program to populate the desired result by executing joins and SQL-like queries.
📍 Focus Areas:
🔹Spark optimizations
🔹Caching strategies
🔹Handling Out of Memory (OOM) errors
Round 2: Data Modeling
📍 Task Description: Design a comprehensive data model to capture information for a cricket tournament, including teams, players, matches, scores, and stadiums. Note that players may be part of multiple leagues and national teams.
📍 Expected Outputs:
🔹Calculate the cumulative score of a player for all matches in the tournament.
🔹Address various SQL queries based on the designed data model.
Round 3: Data Pipeline Handling and Spark Utilization
📍 Topics of Discussion:
🔹Different scenarios in handling data pipelines.
🔹In-depth discussion on data processing from Kafka topics.
🔹Spark features including dynamic allocation, caching, and joins.
🔹Project-related discussions.
Round 4: Behavioral Interview
📍 Focus: General behavioral questions to assess soft skills and cultural fit.